dropout rate
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.92)
- Law (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > Virginia (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (4 more...)
- Education (0.46)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Synergizing Deconfounding and Temporal Generalization For Time-series Counterfactual Outcome Estimation
Liu, Yiling, Dong, Juncheng, Fu, Chen, Shi, Wei, Jiang, Ziyang, Hua, Zhigang, Carlson, David
Estimating counterfactual outcomes from time-series observations is crucial for effective decision-making, e.g. when to administer a life-saving treatment, yet remains significantly challenging because (i) the counterfactual trajectory is never observed and (ii) confounders evolve with time and distort estimation at every step. To address these challenges, we propose a novel framework that synergistically integrates two complementary approaches: Sub-treatment Group Alignment (SGA) and Random Temporal Masking (RTM). Instead of the coarse practice of aligning marginal distributions of the treatments in latent space, SGA uses iterative treatment-agnostic clustering to identify fine-grained sub-treatment groups. Aligning these fine-grained groups achieves improved distributional matching, thus leading to more effective deconfounding. We theoretically demonstrate that SGA optimizes a tighter upper bound on counterfactual risk and empirically verify its deconfounding efficacy. RTM promotes temporal generalization by randomly replacing input covariates with Gaussian noises during training. This encourages the model to rely less on potentially noisy or spuriously correlated covariates at the current step and more on stable historical patterns, thereby improving its ability to generalize across time and better preserve underlying causal relationships. Our experiments demonstrate that while applying SGA and RTM individually improves counterfactual outcome estimation, their synergistic combination consistently achieves state-of-the-art performance. This success comes from their distinct yet complementary roles: RTM enhances temporal generalization and robustness across time steps, while SGA improves deconfounding at each specific time point.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.67)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
Continuum Dropout for Neural Differential Equations
Lee, Jonghun, Oh, YongKyung, Kim, Sungil, Lim, Dong-Young
Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > South Korea > Ulsan > Ulsan (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Supplementary Material
It is worth noting that, Eq. In Section 4.1, we have shown the experimental results of HPM on two population synthetic functions, It is worth noting that, since the synthetic function only simulates the validation loss function ( i.e., The same exploit strategy in PBT, i.e., truncation selection [ All the codes on the synthetic functions were implemented with Autograd. Same to the Figure 1 in Section 4.1, we show the mean performance We show the details of hyperparameters we tuned on the benchmark datasets as follows. The tied weight is used for the embedding and softmax layer.